7 research outputs found

    SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns

    Full text link
    The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications is challenging. Developers may unknowingly introduce anti-patterns in the application's SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications. We introduce techniques for automatically (1) detecting anti-patterns with high precision and recall, (2) ranking the anti-patterns based on their impact on performance, maintainability, and accuracy of applications, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We demonstrate the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. We introduce an anti-pattern detection algorithm that augments query analysis with data analysis. We present a ranking model for characterizing the impact of frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for high-impact anti-patterns using rule-based query refactoring techniques. Our experiments demonstrate that SQLCheck enables developers to create more performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12 figures, Conference: SIGMOD'2

    Resiliency: A Consensus Data Binning Method (Short Paper)

    No full text
    Data binning, or data classification, involves grouping quantitative data points into bins (or classes) to represent spatial patterns and show variation in choropleth maps. There are many methods for binning data (e.g., natural breaks, quantile) that may make the same data appear very different on a map. Some of these methods may be more or less appropriate for certain types of data distributions and map purposes. Thus, when designing a map, novice users may be overwhelmed by the number of choices for binning methods and experts may find comparing results from different binning methods challenging. We present resiliency, a new data binning method that assigns areal units to their most agreed-upon, consensus bin as it persists across multiple chosen binning methods. We show how this "smart average" can effectively communicate spatial patterns that are agreed-upon across binning methods. We also measure the variety of bins a single areal unit can be placed in under different binning methods showing fuzziness and uncertainty on a map. We implement resiliency and other binning methods via an open-source JavaScript library, BinGuru

    Vitality: Promoting Serendipitous Discovery of Academic Literature

    No full text
    Presented at the Georgia Tech Career, Research, and Innovation Development Conference (CRIDC), January 27, 2022.The Career, Research, and Innovation Development Conference (CRIDC) is designed to equip on-campus and online graduate students with tools and knowledge to thrive in an ever-changing job market.There are a few prominent practices for conducting academic literature reviews, including searching for specific keywords on Google Scholar or checking citations from initial seed paper(s). While these approaches serve a critical purpose for academic literature reviews, there remain challenges in identifying relevant literature when (1) different work may utilize the same terminology (e.g., “transformer” in electronics refers to a device that transfers energy between circuits; whereas in computing, it refers to a type of deep learning model, commonly applied to unstructured text data) or (2) similar work may utilize different terminology (e.g., work on “bias” in visualization seldom mentions “uncertainty” even though bias sometimes emerges when people make decisions under uncertainty). We developed a visual analytics system, VitaLITy, to promote serendipitous discovery of academic papers wherein users may “stumble upon” relevant literature, when other search approaches may fail. VitaLITy (1) utilizes transformer language models to help users find semantically similar papers given a list of seed paper(s) or a working abstract, (2) visualizes the embedding space in an interactive 2-D scatterplot, and (3) summarizes meta information about the paper corpus (e.g., keywords, co-authors, citation counts, and publication year). We also curated a comprehensive dataset comprising papers from 38 popular visualization publication venues (e.g., ACM CHI, IEEE VIS) using custom web-scrapers. We have open-sourced the VitaLITy system, dataset, and web-scrapers at https://vitality-vis.github.io/ for the research community to grow the list of supported venues, potentially expanding into other fields, e.g., biology

    Lumos: Increasing Awareness of Biases during Visual Data Analysis

    No full text
    resented at the Georgia Tech Career, Research and Innovation Development Conference. 2021 Atlanta, GAHuman biases impact the way people analyze data and make decisions. Dark-skinned people denied parole (racial bias), women denied C-suite promotions (gender bias), ailing but younger people denied optimal treatment (age bias), etc. are examples of biases rampant in the world. Visual data analysis tools such as Tableau and Excel help users see and understand their data but do not report potential biases exhibited by users (e.g., an overemphasis on the Age attribute). Lumos is an analysis tool that helps users visualize traces of their interactions with data to increase awareness of potential biases. Using in-situ and ex-situ visualization techniques, Lumos provides real-time feedback to users to reflect upon their activities and potentially change future course. For example, Lumos remembers and highlights datapoints that have been previously examined in the same visualizations (in-situ) and overlays the interacted datapoints on the underlying data distribution in a separate visualization (ex-situ). Now sometimes, custom policies rather than biases drive decision-making. For example, a university admissions committee selecting more female than male student applicants can be a conscious choice in abidance to the university's gender-equality policy, rather than an unconscious bias. To address these situations, Lumos allows users to configure custom target distributions and accordingly updates the interaction traces. We believe Lumos can improve data exploration and decision-making scenarios to not only help mitigate the dangers of human biases affecting judgements, but also foster more transparent analysis processes.NSF IIS-181328
    corecore